Analysis: 1.The Dataset contains 130443 rows and 17 columns 2.The columns contains geographical as well as various model details about the Vehicles registered under Washingtion DOL Department 3.DOL Vehicle ID is the unique identifier for our data 4.VIN (1-10) column is not unique which might seem counterintutive but in reality it is truncated and the first 9 digits explain world manufacturer info and vehicle descriptor section and the 10th digit encodes the model year. 5.We also notice few missing values. The model and legislative distict columns in particular have quite a few of them and we will look to impute it or drop later. 6.We also drop DOL Vehicle ID and 2020 Census Tract column as it is not relevant to our anaylsis.
Result: 1.All Missing Data has been treated. 2.The Dataset also has hidden missing values disguised as 0. 3.The missing values in Model column can be imputed using the VIN column which encodes the model information. 4.A few columns have only 3 missing values and can be directly dropped.
Conclusion : By concluding both graphs we can assume that the 'Model 3' of 'Tesla car comapany' has sold most of the cars in electric vehicle market followed by Nissan, Chevrolet, Ford and BMW.
Insights: 1.The EV Industry has been drastically rising over the last decade. 2.The percent change over the decade shows a consistent rise, except the year 2019 which saw a sharp decline but a quick recovery afterwards. 3.Note that the % change graph does not show data for the year 2023 as the year is still going on but can expect the trend to going on upwards.
Caution : 1.The 'Electric range' and 'Base MSRP' has missing values disguised as 0. 2.The Electric range has around 41% and Base MSRP has around 97% missing values. 3.We must ignore this values or else this will disrupt our analysis.
Insights: 1.The data has only around information for retail price for 11 models. 2.Porche is most expensive of all brands. 3.We may use the treemap below to understand the dispersion of different model counts because the average MSRP greatly depends on the number of instances and the number of models each Make has. 4.Since this analysis is based on the 3% of actual data, it is impractical to set the actuall price. (The analysis is to show the way to visualize this similar kind of problems.)